Named Entity Recognition in Crime News Documents Using Classifiers Combination
نویسندگان
چکیده
The increasing volume of generated crime information readily available on the web makes the process of retrieving and analyzing and use of the valuable information in such texts manually a very difficult task. This work is focus on designing models for extracting crime-specific information from the Web. Thus, this paper proposes an ensemble framework for crime named entity recognition task. The main aim is to efficiently integrating feature sets and classification algorithms to synthesize a more accurate classification procedure. First, three well-known text classification algorithms, namely Naïve Bayes, Support Vector Machine and K-Nearest Neighbor classifiers, are employed as base-classifiers for each of the feature sets. Second, weighted voting ensemble method is used to combine theses three classifiers. To evaluate these models, a manually annotated data set that is obtained from BERNAMA is used. Experimental results demonstrate that using ensemble model is an effective way to combine different feature sets and classification algorithms for better classification performance. The ensemble model achieves an overall F-measure of 89.48% for identifying crime type and 93.36% for extracting crime-related entities. The results of the ensemble model trained with suitable features outperform baseline models.
منابع مشابه
Improvement of Chemical Named Entity Recognition through Sentence-based Random Under-sampling and Classifier Combination
Chemical Named Entity Recognition (NER) is the basic step for consequent information extraction tasks such as named entity resolution, drug-drug interaction discovery, extraction of the names of the molecules and their properties. Improvement in the performance of such systems may affects the quality of the subsequent tasks. Chemical text from which data for named entity recognition is extracte...
متن کاملPAYMA: A Tagged Corpus of Persian Named Entities
The goal in the named entity recognition task is to classify proper nouns of a piece of text into classes such as person, location, and organization. Named entity recognition is an important preprocessing step in many natural language processing tasks such as question-answering and summarization. Although many research studies have been conducted in this area in English and the state-of-the-art...
متن کاملCross domain Chinese speech understanding and answering based on named-entity extraction
Chinese language is not alphabetic, with flexible wording structure and large number of domain-specific terms generated every day for each domain. In this paper, a new approach for cross-domain Chinese speech understanding and answering is proposed based on named-entity extraction. This approach includes two parts: a speech query recognition (SQR) part and a speech understanding and answering (...
متن کاملNamed Entity Recognition through Redundancy Driven Classifiers
We present Typhoon, a classifier combination system for Named Entity Recognition (NER), in which two different classifiers are combined to exploit Data Redundancy and Patterns extracted from a large text corpus. Data Redundancy is attained when the same entity occurs in different places in documents, whereas Patterns are 2-grams, 3-grams, 4-grams and 5-grams preceding, and following entities in...
متن کاملVoted NER System using Appropriate Unlabeled Data
This paper reports a voted Named Entity Recognition (NER) system with the use of appropriate unlabeled data. The proposed method is based on the classifiers such as Maximum Entropy (ME), Conditional Random Field (CRF) and Support Vector Machine (SVM) and has been tested for Bengali. The system makes use of the language independent features in the form of different contextual and orthographic wo...
متن کامل